# Interleaved Image-Text Processing
Xgen Mm Phi3 Mini Instruct Interleave R V1.5
Apache-2.0
xGen-MM is a series of the latest foundational large multimodal models (LMMs) developed by Salesforce AI Research, building upon the successful design of the BLIP series with foundational enhancements to ensure a more robust and superior model foundation.
Image-to-Text
Safetensors English
X
Salesforce
7,373
51
Xgen Mm Phi3 Mini Instruct R V1
xGen-MM is the latest foundational large multimodal model series developed by Salesforce AI Research, based on improvements to the BLIP series, featuring powerful image understanding and text generation capabilities.
Image-to-Text
Transformers English

X
Salesforce
804
186
Idefics2 8b
Apache-2.0
Idefics2 is an open-source multimodal model capable of accepting arbitrary sequences of image and text inputs to generate text outputs. It shows significant improvements in OCR, document understanding, and visual reasoning.
Image-to-Text
Transformers English

I
HuggingFaceM4
14.99k
603
Idefics 9b Instruct
Other
IDEFICS is an open-source reproduction of DeepMind's proprietary visual language model Flamingo. It is a multimodal model that can accept arbitrary sequences of images and text as input and generate text output.
Image-to-Text
Transformers English

I
HuggingFaceM4
28.34k
104
Featured Recommended AI Models